Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Monaural speech enhancement based on gated dilated convolutional recurrent network

Xinyuan YOU, Heng WANG

Journal of Computer Applications 2024, 44 (4): 1317-1324. DOI: 10.11772/j.issn.1001-9081.2023040452

Abstract （93）

HTML （3）

PDF （1791KB）（84）

Save

The use of contextual information plays an important role in speech enhancement tasks. To address the under-utilization problem of global speech， a Gated Dilated Convolutional Recurrent Network （GDCRN） for complex spectral mapping was proposed. GDCRN was composed of an encoder， a Gated Temporal Convolution Module （GTCM） and a decoder. The encoder and decoder had asymmetric network structure. Firstly， features were processed by the encoder using a Gated Dilated Convolution Module （GDCM）， which expanded the receptive field. Secondly， longer contextual information was captured and selectively passed through the use of the GTCM. Finally， the deconvolution combined with a Gated Linear Unit （GLU）was used by the decoder， which was connected to the corresponding convolution layer in the encoder using skip connection. Additionally， a Channel Time-Frequency Attention （CTFA） mechanism was introduced. Experimental results show that the proposed network has fewer parameters and shorter training time than other networks such as Temporal Convolutional Neural Network （TCNN） and Gated Convolutional Recurrent Network （GCRN）. The proposed GDCRN significantly improves PESQ （Perceptual Evaluation of Speech Quality） and STOI（Short-Time Objective Intelligibility） up by 0.258 9 and 4.67 percentage points， demonstrating that the proposed network has better enhancement effect and stronger generalization ability.

Table and Figures | Reference | Related Articles | Metrics